[1] 2751713 16
Purpose of this quarto-html
- To explore the dataset from Zhang et al.
- To try to determine if we can find a link between weather and train delays.
Dimensions
Column names
[1] "date" "train_number"
[3] "train_direction" "station_name"
[5] "station_order" "scheduled_arrival_time"
[7] "scheduled_departure_time" "stop_time"
[9] "actual_arrival_time" "actual_departure_time"
[11] "arrival_delay" "departure_delay"
[13] "wind" "weather"
[15] "temperature" "major_holiday"
Summary statistics
| station_name | Mean_arriv | Mean_depar | stdev_arriv | stdev_delay | n | unique_arriv | unique_dep | Mean_temp |
|---|---|---|---|---|---|---|---|---|
| Jianwei Railway Station | 532.0000 | 532.0000 | 0.00000 | 0.00000 | 29 | 1 | 1 | 10.793103 |
| Yuzhou Railway Station | 531.4444 | 531.4444 | 75.67617 | 75.67617 | 36 | 3 | 3 | 5.805556 |
| Guanyun Railway Station | 500.0132 | 500.0132 | 178.21474 | 178.21474 | 151 | 9 | 9 | 6.622517 |
| Fangcheng Railway Station | 485.4444 | 485.4444 | 75.67617 | 75.67617 | 36 | 3 | 3 | 6.055556 |
| Jieshounan Railway Station | 465.2176 | 465.2176 | 359.13111 | 359.13111 | 239 | 18 | 18 | 5.941423 |
| Xingandong Railway Station | 416.3605 | 416.3605 | 212.42152 | 212.42152 | 147 | 10 | 10 | 11.476190 |
| train_number | Mean_arriv | Mean_depar | stdev_arriv | stdev_delay | n | unique_arriv | unique_dep | Mean_temp |
|---|---|---|---|---|---|---|---|---|
| G4027 | 853.1429 | 696.7143 | 382.2505 | 542.9797 | 7 | 7 | 7 | 28.85714 |
| G4919 | 840.0000 | 422.6667 | 653.4977 | 701.7814 | 6 | 3 | 3 | 23.66667 |
| G4950 | 826.5000 | 642.6667 | 410.2257 | 578.8743 | 6 | 6 | 6 | 22.66667 |
| G9252 | 811.0000 | 722.3077 | 253.9600 | 418.3552 | 13 | 13 | 13 | 19.92308 |
| G4923 | 801.0000 | 531.0000 | 534.6631 | 640.0818 | 4 | 4 | 4 | 24.75000 |
| G4966 | 661.2500 | 447.7500 | 411.7026 | 502.2102 | 8 | 4 | 4 | 22.00000 |
Average departure delays
Looking at the data (and not the summary stat)
By number of departures
Basic analysis of relation between weather and departure delays
Call:
aov(formula = departure_delay ~ weather, data = subset3)
Residuals:
Min 1Q Median 3Q Max
-6.222 -0.775 -0.537 0.415 78.415
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.41481 0.04576 -9.066 < 2e-16 ***
weatherdownpour -0.13064 1.03119 -0.127 0.89919
weatherfog -0.09265 0.41992 -0.221 0.82538
weatherhaze 0.41481 0.85540 0.485 0.62773
weatherheavy snow 4.18404 0.54902 7.621 2.67e-14 ***
weatherlight rain 0.44859 0.09085 4.938 7.99e-07 ***
weatherlight snow 0.24500 0.23908 1.025 0.30549
weatherlight to moderate rain 0.06270 0.40806 0.154 0.87788
weathermoderate rain 2.02675 0.29868 6.786 1.20e-11 ***
weathermoderate snow 0.27592 0.57129 0.483 0.62911
weathermoderate to heavy snow 5.63704 1.13982 4.946 7.67e-07 ***
weatherovercast 0.18985 0.08799 2.158 0.03096 *
weathershowers -0.08519 0.55615 -0.153 0.87826
weathersleet 1.52976 0.26303 5.816 6.15e-09 ***
weathersnow showers 1.41481 0.54902 2.577 0.00998 **
weathersunny -0.04837 0.06704 -0.721 0.47062
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.417 on 15230 degrees of freedom
Multiple R-squared: 0.01255, Adjusted R-squared: 0.01158
F-statistic: 12.91 on 15 and 15230 DF, p-value: < 2.2e-16
Call:
aov(formula = departure_delay ~ wind_strength, data = subset3)
Residuals:
Min 1Q Median 3Q Max
-3.968 -0.692 -0.668 0.332 78.308
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1949 0.1413 -1.379 0.1678
wind_strengthfresh breeze from the -0.3315 0.2169 -1.528 0.1264
wind_strengthgentle breeze from the -0.1131 0.1510 -0.749 0.4536
wind_strengthlight winds -0.3197 0.1673 -1.911 0.0560 .
wind_strengthlight winds from the -0.1373 0.1470 -0.934 0.3504
wind_strengthmoderate breeze from the 0.3824 0.1692 2.260 0.0238 *
wind_strengthmoderate gale from the -0.4718 1.9867 -0.237 0.8123
wind_strengthstrong breeze from the 1.1632 0.4549 2.557 0.0106 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.432 on 15238 degrees of freedom
Multiple R-squared: 0.002933, Adjusted R-squared: 0.002475
F-statistic: 6.404 on 7 and 15238 DF, p-value: 1.514e-07
Welch Two Sample t-test
data: departure_delay by wind_strength2
t = -3.9443, df = 1532, p-value = 8.363e-05
alternative hypothesis: true difference in means between group light winds and group strong winds is not equal to 0
95 percent confidence interval:
-0.8456282 -0.2839107
sample estimates:
mean in group light winds mean in group strong winds
-0.3445731 0.2201964
Focus on the stations and actual (positive) departure delays. Filtered data here.
Deselect any stations with few unique departure delays as this is more likely due to other circumstances.